Speaker recognition using the resynthesized speech via spectrum modeling

نویسندگان

Xiang Zhang

Chuan Cao

Lin Yang

Hongbin Suo

Jianping Zhang

Yonghong Yan

چکیده

Recently, using prosodic information such as pitch and energy for speaker recognition has attracted much attention. However, these kinds of systems yield performance much worse than the traditional cepstral based systems. Limited performance improvement can be achieved when combining the two kinds of systems. In this paper, we present a new approach for speaker recognition, which uses the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data are modeled with sinusoids based on pitch, vibration amplitude and phase bias. We use the resynthesized speech data to extract cepstral features for speaker modeling and scoring in the same way as in traditional speaker recognition approaches. We then model these features using GMMs and compensate for speaker and channel variability effects using joint factor analysis. The experiments are carried out on the core condition of NIST 2008 speaker recognition evaluation data. The experimental results show that our proposed system achieves comparable performance to the state-of-the-art cepstral-based joint factor analysis system which uses the original data for speaker recognition. Besides, the fusion of the two kinds of systems can achieve significant performance improvement compared to the cepstral-based system alone.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Speaker recognition using the resynthesized speech via spectrum modeling

نویسندگان

چکیده

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

Speaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words

Speaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words

عنوان ژورنال:

اشتراک گذاری